10 research outputs found

    Deep reinforcement learning based Evasion Generative Adversarial Network for botnet detection

    Get PDF
    Botnet detectors based on machine learning are potential targets for adversarial evasion attacks. Several research works employ adversarial training with samples generated from generative adversarial nets (GANs) to make the botnet detectors adept at recognising adversarial evasions. However, the synthetic evasions may not follow the original semantics of the input samples. This paper proposes a novel GAN model leveraged with deep reinforcement learning (DRL) to explore semantic aware samples and simultaneously harden its detection. A DRL agent is used to attack the discriminator of the GAN that acts as a botnet detector. The agent trains the discriminator on the crafted perturbations during the GAN training, which helps the GAN generator converge earlier than the case without DRL. We name this model RELEVAGAN, i.e. [“relieve a GAN” or deep REinforcement Learning-based Evasion Generative Adversarial Network] because, with the help of DRL, it minimises the GAN's job by letting its generator explore the evasion samples within the semantic limits. During the GAN training, the attacks are conducted to adjust the discriminator weights for learning crafted perturbations by the agent. RELEVAGAN does not require adversarial training for the ML classifiers since it can act as an adversarial semantic-aware botnet detection model. The code will be available at https://github.com/rhr407/RELEVAGAN

    An Investigation on Fragility of Machine Learning Classifiers in Android Malware Detection

    Get PDF
    Machine learning (ML) classifiers have been increasingly used in Android malware detection and countermeasures for the past decade. However, ML based solutions are vulnerable to adversarial evasion attacks. An attacker can craft a malicious sample carefully to fool an underlying pre-trained classifier. In this paper, we highlight the fragility of the ML classifiers against adversarial evasion attacks. We perform mimicry attacks based on Oracle and Generative Adversarial Network (GAN) against these classifiers using our proposed methodology. We use static analysis on Android applications to extract API-based features from a balanced excerpt of a well-known public dataset. The empirical results demonstrate that among ML classifiers, the detection capability of linear classifiers can be reduced as low as 0 by perturbing only up to 4 out of 315 extracted API features. As a countermeasure, we propose TrickDroid, a cumulative adversarial training scheme based on Oracle and GAN-based adversarial data to improve evasion detection. The experimental results of cumulative adversarial training achieves a remarkable detection accuracy of up to 99.46 against adversarial samples

    Security Hardening of Botnet Detectors Using Generative Adversarial Networks

    Get PDF
    Machine learning (ML) based botnet detectors are no exception to traditional ML models when it comes to adversarial evasion attacks. The datasets used to train these models have also scarcity and imbalance issues. We propose a new technique named Botshot , based on generative adversarial networks (GANs) for addressing these issues and proactively making botnet detectors aware of adversarial evasions. Botshot is cost-effective as compared to the network emulation for botnet traffic data generation rendering the dedicated hardware resources unnecessary. First, we use the extended set of network flow and time-based features for three publicly available botnet datasets. Second, we utilize two GANs (vanilla, conditional) for generating realistic botnet traffic. We evaluate the generator performance using classifier two-sample test (C2ST) with 10-fold 70-30 train-test split and propose the use of ’recall’ in contrast to ’accuracy’ for proactively learning adversarial evasions. We then augment the train set with the generated data and test using the unchanged test set. Last, we compare our results with benchmark oversampling methods with augmentation of additional botnet traffic data in terms of average accuracy, precision, recall and F1 score over six different ML classifiers. The empirical results demonstrate the effectiveness of the GAN-based oversampling for learning in advance the adversarial evasion attacks on botnet detectors

    Evasion-aware botnet detection using artificial intelligence

    Get PDF
    Adversarial evasions are modern threats to Machine Learning (ML) based applications. Due to the vulnerabilities in the classic ML inference systems, botnet detectors are equally likely to be attacked using adversarial examples. The evasions can be generated using sophisticated attack strategies using complex AI models. Generative AI models are one of the candidates to spawn evasion attacks. The other significant concern is the data scarcity due to which ML classifiers become biased toward the majority class samples during training. This research proposes novel techniques to improve the detection accuracy of botnet classifiers to mitigate the effects of adversarial evasion and data scarcity. The ultimate goal is to design a sophisticated botnet detector that is adversarially aware in low data regimes. First, the technical background of the research is mentioned to help understand the problem and the potential solutions. Second, a Generative Adversarial Network (GAN) based model called Botshot is proposed to address the dataset imbalance issues using Adversarial Training (AT). Botshot gives promising results in contrast to the most voguish ML classifiers that can be fooled by adversarial samples by 100%. Botshot improves the detection accuracy up to 99.74% after AT as the best case scenario in one of the botnet datasets used. Third, an evasion-aware model called Evasion Generative Adversarial Network (EVAGAN) for botnet detection in low data regimes is presented. EVAGAN outperforms the state-of-the-art Auxiliary Classifier Generative Adversarial Network (ACGAN) in detection performance, stability, and time complexity. Last, an improved version of EVAGAN called deep Reinforcement Learning based Generative Adversarial Network (RELEVAGAN) is bid to make the EVAGAN further hardened against evasion attacks and preserve the attack sample semantics. The rationale is proactively attacking the detection model with a Deep Reinforcement Learning (DRL) agent to know the possible unseen functionality-preserving adversarial evasions. The attacks are conducted during the EVAGAN training to adjust the network weights for learning perturbations crafted by the agent. RELEVAGAN, like its parent model, does not require AT for the ML classifiers since it can act as an adversarial-aware botnet detection model. The experimental results show the RELEVAGAN’s supremacy over EVAGAN in terms of early convergence. RELEVAGAN is one step further toward designing an evasion aware and functionalitypreserving botnet detection model that tends to be sustainable with evolving botnets with the help of DRL

    Evasion Generative Adversarial Network for Low Data Regimes

    Get PDF
    A myriad of recent literary works has leveraged generative adversarial networks (GANs) to generate unseen evasion samples. The purpose is to annex the generated data with the original train set for adversarial training to improve the detection performance of machine learning (ML) classifiers. The quality of generated adversarial samples relies on the adequacy of training data samples. However, in low data regimes like medical diagnostic imaging and cybersecurity, the anomaly samples are scarce in number. This paper proposes a novel GAN design called Evasion Generative Adversarial Network (EVAGAN) that is more suitable for low data regime problems that use oversampling for detection improvement of ML classifiers. EVAGAN not only can generate evasion samples, but its discriminator can act as an evasion-aware classifier. We have considered Auxiliary Classifier GAN (ACGAN) as a benchmark to evaluate the performance of EVAGAN on cybersecurity (ISCX-2014, CIC-2017 and CIC2018) botnet and computer vision (MNIST) datasets. We demonstrate that EVAGAN outperforms ACGAN for unbalanced datasets with respect to detection performance, training stability and time complexity. EVAGAN’s generator quickly learns to generate the low sample class and hardens its discriminator simultaneously. In contrast to ML classifiers that require security hardening after being adversarially trained by GAN-generated data, EVAGAN renders it needless. The experimental analysis proves that EVAGAN is an efficient evasion hardened model for low data regimes for the selected cybersecurity and computer vision datasets. Code will be available at HTTPS://www.github.com/rhr407/EVAGAN

    Evasion Generative Adversarial Network for Low Data Regimes

    Get PDF
    A myriad of recent literary works has leveraged generative adversarial networks (GANs) to generate unseen evasion samples. The purpose is to annex the generated data with the original train set for adversarial training to improve the detection performance of machine learning (ML) classifiers. The quality of generated adversarial samples relies on the adequacy of training data samples. However, in low data regimes like medical diagnostic imaging and cybersecurity, the anomaly samples are scarce in number. This paper proposes a novel GAN design called Evasion Generative Adversarial Network (EVAGAN) that is more suitable for low data regime problems that use oversampling for detection improvement of ML classifiers. EVAGAN not only can generate evasion samples, but its discriminator can act as an evasion-aware classifier. We have considered Auxiliary Classifier GAN (ACGAN) as a benchmark to evaluate the performance of EVAGAN on cybersecurity (ISCX-2014, CIC-2017 and CIC2018) botnet and computer vision (MNIST) datasets. We demonstrate that EVAGAN outperforms ACGAN for unbalanced datasets with respect to detection performance, training stability and time complexity. EVAGAN’s generator quickly learns to generate the low sample class and hardens its discriminator simultaneously. In contrast to ML classifiers that require security hardening after being adversarially trained by GAN-generated data, EVAGAN renders it needless. The experimental analysis proves that EVAGAN is an efficient evasion hardened model for low data regimes for the selected cybersecurity and computer vision datasets. Code will be available at HTTPS://www.github.com/rhr407/EVAGAN

    AndroMalPack: Enhancing the ML-based Malware Classification by Detection and Removal of Repacked Apps for Android Systems

    Get PDF
    Due to the widespread usage of Android smartphones in the present era, Android malware has become a grave security concern. The research community relies on publicly available datasets to keep pace with evolving malware. However, a plethora of apps in those datasets are mere clones of previously identified malware. The reason is that instead of creating novel versions, malware authors generally repack existing malicious applications to create malware clones with minimal effort and expense. This paper investigates three benchmark Android malware datasets to quantify repacked malware using package names-based similarity. We consider 5560 apps from the Drebin dataset, 24,533 apps from the AMD and 695,470 apps from the AndroZoo dataset for analysis. Our analysis reveals that 52.3 apps in Drebin, 29.8 apps in the AMD and 42.3 apps in the AndroZoo dataset are repacked malware. Furthermore, we present AndroMalPack, an Android malware detector trained on clones-free datasets and optimized using Nature-inspired algorithms. Although trained on a reduced version of datasets, AndroMalPack classifies novel and repacked malware with a remarkable detection accuracy of up to 98.2 and meagre false-positive rates. Finally, we publish a dataset of cloned apps in Drebin, AMD, and AndrooZoo to foster research in the repacked malware analysis domain

    Students' participation in collaborative research should be recognised

    No full text
    Letter to the editor
    corecore